Goto

Collaborating Authors

 Surgery



Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation Kun Yuan 1,2,3

Neural Information Processing Systems

Surgical video-language pretraining (VLP) faces unique challenges due to the knowledge domain gap and the scarcity of multi-modal data. This study aims to bridge the gap by addressing issues regarding textual information loss in surgical lecture videos and the spatial-temporal challenges of surgical VLP. To tackle these issues, we propose a hierarchical knowledge augmentation approach and a novel Procedure-Encoded Surgical Knowledge-Augmented Video-Language Pretraining (PeskaVLP) framework. The proposed knowledge augmentation approach uses large language models (LLM) to refine and enrich surgical concepts, thus providing comprehensive language supervision and reducing the risk of overfitting. The PeskaVLP framework combines language supervision with visual self-supervision, constructing hard negative samples and employing a Dynamic Time Warping (DTW) based loss function to effectively comprehend the cross-modal procedural alignment. Extensive experiments on multiple public surgical scene understanding and cross-modal retrieval datasets show that our proposed method significantly improves zero-shot transferring performance and offers a generalist visual representation for further advancements in surgical scene understanding.


SurgicAI: A Hierarchical Platform for Fine-Grained Surgical Policy Learning and Benchmarking

Neural Information Processing Systems

Despite advancements in robotic-assisted surgery, automating complex tasks like suturing remains challenging due to the need for adaptability and precision. Learningbased approaches, particularly reinforcement learning (RL) and imitation learning (IL), require realistic simulation environments for efficient data collection. However, current platforms often include only relatively simple, non-dexterous manipulations and lack the flexibility required for effective learning and generalization. We introduce SurgicAI, a novel platform for development and benchmarking that addresses these challenges by providing the flexibility to accommodate both modular subtasks and more importantly task decomposition in RL-based surgical robotics. Compatible with the da Vinci Surgical System, SurgicAI offers a standardized pipeline for collecting and utilizing expert demonstrations. It supports the deployment of multiple RL and IL approaches, and the training of both singular and compositional subtasks in suturing scenarios, featuring high dexterity and modularization. Meanwhile, SurgicAI sets clear metrics and benchmarks for the assessment of learned policies. We implemented and evaluated multiple RL and IL algorithms on SurgicAI. Our detailed benchmark analysis underscores SurgicAI's potential to advance policy learning in surgical robotics.




Temporal Causal Mediation through a Point Process: Direct and Indirect Effects of Healthcare Interventions

Neural Information Processing Systems

Deciding on an appropriate intervention requires a causal model of a treatment, the outcome, and potential mediators. Causal mediation analysis lets us distinguish between direct and indirect effects of the intervention, but has mostly been studied in a static setting. In healthcare, data come in the form of complex, irregularly sampled time-series, with dynamic interdependencies between a treatment, outcomes, and mediators across time. Existing approaches to dynamic causal mediation analysis are limited to regular measurement intervals, simple parametric models, and disregard long-range mediator-outcome interactions. To address these limitations, we propose a non-parametric mediator-outcome model where the mediator is assumed to be a temporal point process that interacts with the outcome process. With this model, we estimate the direct and indirect effects of an external intervention on the outcome, showing how each of these affects the whole future trajectory. We demonstrate on semi-synthetic data that our method can accurately estimate direct and indirect effects.



Supplementary Material for Text Promptable Surgical Instrument Segmentation with Vision-Language Models Meng Wei

Neural Information Processing Systems

In the supplementary material, we include additional method details, experimental results and analysis, and visualizations that could not be accommodated in the main text due to space constraints. Below, we provide the surgical instrument prompts generated by utilizing OpenAI GPT-4 [8] and Google Bard [2]. They are used in our experiments section. OpenAI GPT-4 based prompts The input template for OpenAI GPT-4 is defined as: Please describe the appearance of [class_name] in endoscopic surgery, and change the description to a phrase with subject, and not use colons. We obtain the following prompts for different surgical instruments: Bipolar forceps.